-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gnrc_tcp: Overhaul all debug messages #15209
gnrc_tcp: Overhaul all debug messages #15209
Conversation
cb628c7
to
560cf6e
Compare
I don't like that so much functionality is hidden in macros, but in general I am not against introducing debug features if it helps you maintaining the code. Regarding testing: I have a failure on diff --git a/tests/gnrc_tcp/tests/05-garbage-pkts.py b/tests/gnrc_tcp/tests/05-garbage-pkts.py
index ce92a22c63..89bae5c1bf 100755
--- a/tests/gnrc_tcp/tests/05-garbage-pkts.py
+++ b/tests/gnrc_tcp/tests/05-garbage-pkts.py
@@ -140,7 +140,7 @@ if __name__ == "__main__":
if type(getattr(script, t)).__name__ == "function"
and t.startswith("test_")]
for test in tests:
- res = run(test, timeout=10, echo=False)
+ res = run(test, timeout=10, echo=True)
if res != 0:
sys.exit(res)
print(os.path.basename(sys.argv[0]) + ": success\n") I get
|
I am not a huge fan of excessive macro use either, however this is more or less the only way to produce messages without having to write things like FILE or LINE through the codebase. Using Macros here is the most visually pleasing approach from my point of view. I was able to reproduce the failure in "test_send_ack_instead_of_syn", I'll look into that, do you have any Ideas where to look? I am a bit puzzled that this Issue occurs with this change... Does this happen on a non-native board as well? |
With non-native (
Maybe a timing issue as |
560cf6e
to
098acc8
Compare
I am fairly confident that it is not the TCP module. I used wireshark find differences between "test_send_ack_instead_of_syn" with enabled and disabled Debug prints. It is rather interesting. In theory, the test tries so send an ACK packet instead of a SYN packet on connection establishment. Afterwards it sends a SYN to establish a connection normally. In reality, the test does not send a single ACK packet, instead the host system tries to send the defunct packet over and and over and over again before sending a valid SYN packet. With disabled prints GNRC_TCP received the valid SYN after some time, with enabled prints the test runs in a timeout (maybe due to the stdio delay caused by large amounts of debug prints caused by the huge amounts of received ACK packets). If I prevent the test from retransmiting the initial ACK, the issue should be solved. EDIT: Interestingly the amount of sent ACKs is the same with enabled and disabled debug messages, however there is no SYN packet sent with enabled debug print, this is quite mysterious to me. Here are the wireshark captures of this specific test case. FOLLOWUP EDIT: In theory the reply from GNRC_TCP to the malformed ACK should be a reset packet to be send, this does not happen. In addition, the are no prints from the eventloop that a packet was received from the network layer. It looks like the malformed packet is discarded before the GNRC_TCP processes it. This leads to a missing reset sent in response leading to the flood of retransmissions. |
Let me have a look first, if this is testing a regression. |
According to the comment in that test #12001 is the issue that caused it, which explicitly mentions that the garbled ACK needs to be sent multiple times to cause the issue, so I don't think it's a good idea to prevent the multiple retransmissions. |
This is exactly what the test is for: # see https://github.com/RIOT-OS/RIOT/pull/12001
provided_data = base64.b64decode("rwsQf2pekYLaU+exUBBwgPDKAAA=")
tcp_hdr = TCP(provided_data)
assert provided_data == raw(tcp_hdr)
# set destination port to application specific port
tcp_hdr.dport = dst_port
sendp(Ether(dst=dst_l2) / IPv6(src=src_ll, dst=dst_ll) / tcp_hdr,
iface=src_if, verbose=0, count=1000) is sending the garbled packages as found in #12001, # check if server actually still works
with socket.socket(socket.AF_INET6, socket.SOCK_STREAM) as sock:
sock.settimeout(child.timeout)
addr_info = socket.getaddrinfo(dst_ll + '%' + src_if, dst_port,
type=socket.SOCK_STREAM)
sock.connect(addr_info[0][-1])
child.expect_exact('gnrc_tcp_open_passive: returns 0')
verify_pktbuf_empty(child) checks if the |
Alternatively we can set a |
Never said it was, timeout configuration issues are usually found in the tests if they are too tight :-) |
After looking into the test, it is not automatic retransmissions that send the packet a thousand times, it is the tests intention to do so. I am okay with that however none of these packet is processed by the eventloop. The first packet that is actually processed by GNRC_TCP is the packet containing a SYN. From my point of view it is okay to send the malformed packet over and over, however these packets should at least be processed by GNRC_TCP code. I am not sure that causes the packets to be discarded, maybe an invalid checksum or something like that. The tests TCP header is constructed from static data maybe the checksum is invalid due changing IPv6 addresses in use. |
I think that it is not processed is intentional, as if they would be it would cause the fault #12001 points out ;-). It's a regression test. It does not have to make sense for the current implementation, it is mainly their so a bug does not return. I am not 100% sure what you mean by "is not processed by the eventloop". Do you mean, the TCP module does not even receive those packets or do you mean they are dropped right after reception (you mentioned invalid checksums, which would be checked by TCP, as IPv6 does not have any checksums) |
After sending 10 instead of 1000 packets I see that debug messages to show up and I see that packet are infact discarded by the TCP layer due to invalid checksums instead of containing an ACK (this is checked later) on connection establishment. This is good to know that the packets arrive. However this test does not test what its supposed to test, because invalid checksum should trigger different behavior that false flags. |
#12001 shows that this did not use to be the case, so if this is not the case now, it is testing what it is supposed to test, isn't it? |
It seems the network interface is just overloaded with messages. There are a lot of
|
Jeah but It still doesn't explain why the test works with disabled debug messaging. The load on the network interface is the same in both cases. |
Ah I missclicked. Sorry. |
Of course :) |
BTW, while we are talking I try to get the test to run. Most of what I write is just brainstorming new ideas ;-). |
The fix was here, btw. https://github.com/RIOT-OS/RIOT/pull/11999/files |
As a potential solution: What do you think? |
Setting |
If you are unable to recreate the memory leak, this might have something todo with the packet discard to an invalid checksum before the leak should occur. ;) |
It does not trigger the regression, but neither does |
This is the fix I came up with, if you want to apply it already here: diff --git a/tests/gnrc_tcp/tests/05-garbage-pkts.py b/tests/gnrc_tcp/tests/05-garbage-pkts.py
index ce92a22c63..231f4e996a 100755
--- a/tests/gnrc_tcp/tests/05-garbage-pkts.py
+++ b/tests/gnrc_tcp/tests/05-garbage-pkts.py
@@ -7,6 +7,7 @@
# directory for more details.
import base64
+import pexpect
import os
import socket
import sys
@@ -96,6 +97,15 @@ def test_short_header(child, src_if, src_ll, dst_if, dst_l2, dst_ll, dst_port):
@testfunc
def test_send_ack_instead_of_syn(child, src_if, src_ll,
dst_if, dst_l2, dst_ll, dst_port):
+ # check if gnrc_tcp_open_passive in `@testfunc` decorator generates debug
+ # output
+ debug = child.expect([pexpect.TIMEOUT,
+ 'GNRC_TCP: Enter "gnrc_tcp_open_passive"'])
+ # with debug output the node can be overwhelmed by packets + output
+ if debug:
+ count = 10
+ else:
+ count = 1000
# see https://github.com/RIOT-OS/RIOT/pull/12001
provided_data = base64.b64decode("rwsQf2pekYLaU+exUBBwgPDKAAA=")
tcp_hdr = TCP(provided_data)
@@ -103,7 +113,7 @@ def test_send_ack_instead_of_syn(child, src_if, src_ll,
# set destination port to application specific port
tcp_hdr.dport = dst_port
sendp(Ether(dst=dst_l2) / IPv6(src=src_ll, dst=dst_ll) / tcp_hdr,
- iface=src_if, verbose=0, count=1000)
+ iface=src_if, verbose=0, count=count)
# check if server actually still works
with socket.socket(socket.AF_INET6, socket.SOCK_STREAM) as sock: It just checks if there is debug output and sets |
The eventloop prints output on thread startup way before sendp is executed. Maybe your fix can react to that. |
Then it would be like this ;-) diff --git a/tests/gnrc_tcp/tests/05-garbage-pkts.py b/tests/gnrc_tcp/tests/05-garbage-pkts.py
index ce92a22c63..d6de7cadea 100755
--- a/tests/gnrc_tcp/tests/05-garbage-pkts.py
+++ b/tests/gnrc_tcp/tests/05-garbage-pkts.py
@@ -7,6 +7,7 @@
# directory for more details.
import base64
+import pexpect
import os
import socket
import sys
@@ -96,6 +97,15 @@ def test_short_header(child, src_if, src_ll, dst_if, dst_l2, dst_ll, dst_port):
@testfunc
def test_send_ack_instead_of_syn(child, src_if, src_ll,
dst_if, dst_l2, dst_ll, dst_port):
+ # check if entering _event_loop in `@testfunc` decorator generates debug
+ # output
+ debug = child.expect([pexpect.TIMEOUT,
+ 'GNRC_TCP: Enter "_event_loop"'])
+ # with debug output the node can be overwhelmed by packets + output
+ if debug:
+ count = 10
+ else:
+ count = 1000
# see https://github.com/RIOT-OS/RIOT/pull/12001
provided_data = base64.b64decode("rwsQf2pekYLaU+exUBBwgPDKAAA=")
tcp_hdr = TCP(provided_data)
@@ -103,7 +113,7 @@ def test_send_ack_instead_of_syn(child, src_if, src_ll,
# set destination port to application specific port
tcp_hdr.dport = dst_port
sendp(Ether(dst=dst_l2) / IPv6(src=src_ll, dst=dst_ll) / tcp_hdr,
- iface=src_if, verbose=0, count=1000)
+ iface=src_if, verbose=0, count=count)
# check if server actually still works
with socket.socket(socket.AF_INET6, socket.SOCK_STREAM) as sock: |
I added your provided patch and modified the expect condition. Currently the test code expects "GNRC_TCP" to increase the change to detect debug prints if ENABLE_DEBUG is not set in the gnrc_tcp_eventloop.c. |
Since fixing the regression test is a seperate issue, are you happy with this PR? |
3ea8961
to
55314bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor issues, otherwise I think we are good to go.
57c8be3
to
0b2501f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, please squash
0b2501f
to
d2ec12d
Compare
Please squash. |
aa1b7d0
to
537c723
Compare
Done. Thanks for your patience, support and help while tracing down these obscure issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. Thank you too :-)
537c723
to
0cc06a5
Compare
Contribution description
This PR streamlines all Debug messaging in GNRC_TCP, by adding 4 dedicated macros using DEBUG.
Each macro references the function name, file name and line number to the given debug message allowing fast localization of the message. Each macro has a dedicated Role:
Testing procedure
All gnrc_tcp tests muss be successful with and without enabled debug messages.
Issues/PRs references
None